QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning

نویسندگان

چکیده

When individuals interact with one another to accomplish specific goals, they learn from others’ experiences achieve the tasks at hand. The same holds for learning in virtual environments, such as video games. Deep multiagent reinforcement shows promising results terms of completing many challenging tasks. To demonstrate its viability, most algorithms use value decomposition multiple agents. guide each agent, behavior is utilized decompose combined Q-value agents into individual agent Q-values. A different mixing method can be utilized, using a monotonicity assumption based on QMIX and QVMix. However, this selects actions through greedy policy. agents, which require large numbers training trials, are not addressed. In paper, we propose novel hybrid policy action selection an known Selection Optimization DRL (QSOD). grey wolf optimizer (GWO) used determine choice individuals’ actions. As GWO, there proper attention among facilitated agents’ coordination another. We StarCraft 2 Learning Environment compare our proposed algorithm state-of-the-art Experimental that outperforms QVMix all scenarios requires fewer trials.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning

Deep reinforcement learning for multi-agent cooperation and competition has been a hot topic recently. This paper focuses on cooperative multi-agent problem based on actor-critic methods under local observations settings. Multi agent deep deterministic policy gradient obtained state of art results for some multi-agent games, whereas, it cannot scale well with growing amount of agents. In order ...

متن کامل

Multi-Agent Deep Reinforcement Learning

This work introduces a novel approach for solving reinforcement learning problems in multi-agent settings. We propose a state reformulation of multi-agent problems in R that allows the system state to be represented in an image-like fashion. We then apply deep reinforcement learning techniques with a convolution neural network as the Q-value function approximator to learn distributed multi-agen...

متن کامل

Multi-agent Learning and the Reinforcement Gradient

The number of proposed reinforcement learning algorithms appears to be ever-growing. This article tackles the diversification by showing a persistent principle in several independent reinforcement learning algorithms that have been applied to multi-agent settings. While their learning structure may look very diverse, algorithms such as Gradient Ascent, Cross learning, variations of Q-learning a...

متن کامل

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging onand off-policy updates for deep reinforcement learning. Theoretical resu...

متن کامل

Lenient Multi-Agent Deep Reinforcement Learning

Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated bec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2021

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2021.3113350